Dora the Explorer: Directed Outreaching Reinforcement Action-selection

ثبت نشده
چکیده

Exploration is a fundamental aspect of Reinforcement Learning. Two key challenges are how to focus exploration on more valuable states, and how to direct exploration toward gaining new world knowledge. Visit-counters have been proven useful both in practice and in theory for directed exploration. However, a major limitation of counters is their locality, considering only the immediate one step exploration value. While there are a few model-based solutions to this difficulty, a model-free approach is still missing. We propose E-values, a generalization of counters that can be used to evaluate the propagating exploratory value over stateaction trajectories. We compare our approach to commonly used RL techniques, and show that using E-value improves learning and performance over traditional counters. We also show how our method can be implemented with function approximation to learn continuous MDPs.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dora the Explorer: Directed Outreaching Reinforcement Action-selection

Exploration is a fundamental aspect of Reinforcement Learning. Two key challenges are how to focus exploration on more valuable states, and how to direct exploration toward gaining new world knowledge. Visit-counters have been proven useful both in practice and in theory for directed exploration. However, a major limitation of counters is their locality, considering only the immediate one step ...

متن کامل

Dora the Explorer: Directed Outreaching Reinforcement Action-selection

Exploration is a fundamental aspect of Reinforcement Learning. Two key challenges are how to focus exploration on more valuable states, and how to direct exploration toward gaining new world knowledge. Visit-counters have been proven useful both in practice and in theory for directed exploration. However, a major limitation of counters is their locality, considering only the immediate one step ...

متن کامل

A Computational Model of Cortico-Striato-Thalamic Circuits in Goal-Directed Behaviour

A connectionist model of cortico-striato-thalamic loops unifying learning and action selection is proposed. The aim in proposing the connectionist model is to develop a simple model revealing the mechanisms behind the cognitive process of goal directed behaviour rather than merely obtaining a model of neural structures. In the proposed connectionist model, the action selection is realized by a ...

متن کامل

A Novel Structure for Realizing Goal-directed Behavior

Intelligent organisms complete goal-directed behaviour by accomplishing a series of cognitive process. Inspired from these cognitive processes, in this work, a novel structure composed of Adaptive Resonance Theory and an Action Selection module is introduced. This novel structure is capable of recognizing task relevant patterns and choosing task relevant actions to complete goal-directed behavi...

متن کامل

RRLUFF: Ranking function based on Reinforcement Learning using User Feedback and Web Document Features

Principal aim of a search engine is to provide the sorted results according to user’s requirements. To achieve this aim, it employs ranking methods to rank the web documents based on their significance and relevance to user query. The novelty of this paper is to provide user feedback-based ranking algorithm using reinforcement learning. The proposed algorithm is called RRLUFF, in which the rank...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2018